A Dynamic Method for the Evaluation and Comparison of Imputation Techniques
نویسندگان
چکیده
Imputation of missing data is important in many areas, such as reducing non-response bias in surveys and maintaining medical documentation. Estimating the uncertainty inherent in the imputed values is one way of evaluating the results of the imputation process. This paper presents a new method for the estimation of imputation uncertainty, which can be implemented as part of any imputation method, and which can be used to estimate the accuracy of the imputed values generated by both parametric and non-parametric imputation techniques. The proposed approach can be used to assess the feasibility of the imputation process for large complex datasets, and to compare the effectiveness of candidate imputation methods when they are applied to the same dataset. Current uncertainty estimation methods are described and their limitations are discussed. The ideas underpinning the proposed approach are explained in detail, and a case study is presented which shows how the new method has been applied in practice.
منابع مشابه
An Empirical Comparison of Performance of the Unified Approach to Linearization of Variance Estimation after Imputation with Some Other Methods
Imputation is one of the most common methods to reduce item non_response effects. Imputation results in a complete data set, and then it is possible to use naϊve estimators. After using most of common imputation methods, mean and total (imputation estimators) are still unbiased. However their variances (imputation variances) are underestimated by naϊve variance estimators. Sampling mechanism an...
متن کاملImputation of parent-offspring trios and their effect on accuracy of genomic prediction using Bayesian method
The objective of this study was to evaluate the imputation accuracy of parent-offspring trios under different scenarios. By using simulated datasets, the performance Bayesian LASSO in genomic prediction was also examined. The genome consisted of 5 chromosomes and each chromosome was set as 1 Morgan length. The number of SNPs per chromosome was 10000. One hundred QTLs were randomly distributed a...
متن کاملThe Comparison of Direct and Indirect Optimization Techniques in Equilibrium Analysis of Multibody Dynamic Systems
The present paper describes a set of procedures for the solution of nonlinear static-equilibrium problems in the complex multibody mechanical systems. To find the equilibrium position of the system, five optimization techniques are used to minimize the total potential energy of the system. Comparisons are made between these techniques. A computer program is developed to evaluate the equality co...
متن کاملMissing data imputation in multivariable time series data
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...
متن کاملEffect of Reference Population Size and Imputation Methods on the Accuracy of Imputation in Pure and Mixed Populations
Imputation as a method of creating low-density chips to high-density chips has been introduced to increase the accuracy of genomic selection in animals. In the current study, to investing imputation accuracy, three populations of mixed (scenario 1), pure (scenario 2) and mixed + pure (scenario 3) were simulated using QMSim. Two methods of imputation including Beagle and Flmpute were used fo...
متن کاملAccuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)
Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007